Digital Humanities Asia Workshop

Stylometerics and Genre Research in Imperial Chinese Studies

Python Basics

Paul Vierthaler, Boston College

@pvierth, vierthal@bc.edu

Python Basics

We will need to know a little bit about Python in order for the following tutorial to make sense. This will not be a comprehensive introduction to Python. It will be just enough to get us started.

We are using Python 3.

Many people are still using Python 2.7. We are using 3 because it is easier to use to study Chinese language. Python 2.7 is no longer being updated (aside from security updates).

Python is case sensitive.

A is not the same thing as a.

Python doesn't like Chinese punctuation.

You will need to use quotation marks throughout your code. Be careful, if you type commas or quotation marks while typing in Chinese, Python will not know how to handle them.

A # (hashtag) starts a comment.

This allows you to tell someone who is reading your code what it does.

Variables

A variable can be thought of as a way to contain information. You have to create them in order to use them. They can store any type of information in Python. We can call them ALMOST anything we want. We should avoid reserved words in Python (if, or, file, etc.). It should start with a lowercase letter and not an uppercase one or a number.

Variables allow us to save information to reuse it later.


In [2]:
# You can store integers
x = 10

# You can store strings
y = "Hi, my name is Paul"

# A variable can be as long as you like. It is best to use variable names
# that express what the variable is.
long_variable_names_work_too = 1.3

hi = 'hello'

Printing

This statement allows you to print (display) something to the console. This is not sending anything to your printer.


In [4]:
print("It will change")


It will change

A Few Data Types

Integers

Whole numbers. This behave a bit differently than expected when you divide them. Integer division drops the remainder. By default, Python 3 performs float division. Two slashes (//) allow for Integer division (we will see it later).


In [3]:
# Here are some integers:
2
5
5000


Out[3]:
5000

In [4]:
# Here is some regular division:
5/2


Out[4]:
2.5

In [5]:
# Here is some integer division:
5//2


Out[5]:
2

Floats

Floating Point Numbers (decimal numbers). These need to be treated with care as well, as they are estimations of precise numbers. Remember, under the hood, the computer uses binary.


In [6]:
# Here are some floating point numbers:
1.4
200.12
.008


Out[6]:
0.008

In [7]:
# Here is some floating point number division:
1000.15/13


Out[7]:
76.93461538461538

Note the trailing numbers. They are not extremely precise. Be careful

Strings

Words. These are denoted using either single or double quotation marks.


In [8]:
"This is a string."
'This is also a string.'


Out[8]:
'This is also a string.'

Getting a substring

You can get a single character, or a substring by refering the index of the desired characters by number. Python is 0 index (meaning 0 is the first place, not 1).


In [9]:
my_string = "This is my string."
print(my_string[0])
print(my_string[11:15])
print(my_string[-4:])


T
stri
ing.

Boolean Values

True and False are important values. They allow us to create logic in our programs.

Checking Boolean Values

<

is less than

>

is greater than

<=

is less than or equal to

>=

is greater than or equal to

==

is equal to

!=

is not equal to


In [10]:
print(1<5)
print(2>5)
print(4==4)


True
False
True

Lists

This is an object that can store information. It is ordered and very useful. Denoted with square brackets.


In [11]:
# This is an empty list:
[]

# This is a list with some information.
[1, 2, 3, 4, 5, 6]


Out[11]:
[1, 2, 3, 4, 5, 6]

Retriving Information From Lists

You can get information out of a list by calling an item's index. Python is 0 indexed (meaning the first element is a 0 not a 1).


In [12]:
numbers = [1,2,3,4,5,6]
print(numbers[0])


1

Dictionaries

Dictionaries also store information but are not ordered. They use keys to refer to values. They are denoted with curly brackets.


In [13]:
# This is an empty list:
{}

# This is a list with some information:
{"Independence Day":"July 4th", "Halloween":"October 31st", "Labor Day 2016":"September 6th"}


Out[13]:
{'Halloween': 'October 31st',
 'Independence Day': 'July 4th',
 'Labor Day 2016': 'September 6th'}

Retreiving information from a dictionary


In [14]:
holiday_dates = {"Independence Day":"July 4th", "Halloween":"October 31st", "Labor Day 2016":"September 6th"}

print(holiday_dates["Halloween"])


October 31st

Python Structure

Python uses indentation to denote code blocks. Some languages use keywords, like "end"

Loops

loops allow you to run the same piece of code over and over.

While loops

While a statment is true, execute the code inside the block:


In [15]:
i = 0
while i < 4:
    print(i)
    
    # Increase i by one. This can also be written i += 1
    i = i + 1


0
1
2
3

For loops

Iterate through each item in a list (or other enumerable object).


In [16]:
animals = ["tiger", "lion", "monkey", "pig"]
for animal in animals:
    print(animal)


tiger
lion
monkey
pig

Libraries

We can use code other people have written by importing libraries. These extend the basic functionality of Python. They are not automatically imported into the namespace for efficiency reasons. Anaconda comes with many libraries that will make our life much easier. We will import them as needed


In [17]:
import math, os, re

In [ ]: